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METHOD FOR DETERMINING AND MODIFYING PROTEIN/PEPTIDE SOLUBILITY 

FIELD OF THE INVENTION 
The present invention relates generally to improving the solubility of 

proteins/peptides and, more particularly to a method for identifying more or less soluble 
proteins/peptides from libraries of mutants thereof generated from the directed 
evolution of genes which express these proteins/peptides. This invention was made 
with government support under Contract No. W-7405-ENG-36 awarded by the U.S. 
Department of Energy to The Regents of the University of California. The government 
has certain rights in the invention. 

BACKGROUND OF THE INVENTION 
Protein insolubility constitutes a significant problem in basic and applied 
bioscience, in many situations limiting the rate of progress in these areas. Protein 
folding and solubility has been the subject of considerable theoretical and empirical 
research. However, there still exists no general method for improving intrinsic protein 
solubility. Such a method would greatly facilitate protein structure-function studies, drug 
design, de novo peptide and protein design and associated structure-function studies, 
industrial process optimization using bioreactors and microorganisms, and many 
disciplines in which a process or application depends on the ability to tailor or improve 
the solubility of proteins, screen or modify the solubility of large numbers of unique 
proteins about which little or no structure-function information is available, or adapt the 
solubility of proteins to new environments when the structure and function of the 
protein(s) are poorly understood or unknown. 

Overexpression of cloned genes using an expression host, for example E. coli, 
is the principal method of obtaining proteins for most applications. Unfortunately, many 
such cloned foreign proteins are insolubtesor unstable when overexpressed. There are 
4wo sets of approaches currently in use which deal with such insoluble proteins. One 
ssl of approaches modifies the eiwirasmemt of the in mo arei/or in mSSpd. For 
example, proteins may be expressed as fusions with more soluble proteins, or directed 
to specific cellular locations. Chaperons may be coexpressed to assist folding 
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pathways. Insoluble proteins may be purified tan inclusion bodies using denaturants 
and the protein subsequently refolded in the absence of the denaturant. Modified 
growth media and/or growth conditions can sometimes improve the folding and 
solubility of a foreign protein. However, tfiese methods are frequently cumbersome 
■ unreliable, ineffective, or lack generality. A second set of approaches changes the 
sequence of the expressed protein. Rational approaches employ ste^irected mutation 
of key residues to improve protein stability and solubility. Alternatively, a smaller, more 
soluble fraament nf the 

_ r ruajw^wsBa. ,nese approaches require a priori 

knowledge about the structure of the protein, knowledge which is generally unavailable 
when the protein is insoluble. Furthermore, rational design approaches are best 
appl.ed when the problem involves only a small number of amino-acid changes 
Rnally, even when the structure is known, the changes required to improve solubility 
may be unclear. Thus, many thousands of possible combinations of mutations may 
have to be investigated .eading to what is essentially an "irrational" or random 
mutagenesis approach. Such an approach requires a method for rapidly determining 
the solubility of each version. 

Random or "irrational" mutagenesis redesign of protein solubility carries the 
possibility that the native function of the protein may be destroyed or modified by the 
.nadvertent mutation of residues which are important for function, but not necessarily 
related to solubility. However, protein solubility is stronghy influenced by interaction with 
the environment through surface amino acid residues, while catalytic activities and/or 
small substrate recognition often involve partially buried or cleft residues distant from 
the surface residues. Thus, in many situations, rational mutation of proteins has 
demonstrated that the solubility of a protein can be modified without destroying the 
natrve function of the protein. Modification the function of a protein without effecting 
rts solubility has also been frequently observed. Furthermore, spontaneous mutants of 
protems bearing only 1 or 2 point mutations have been serendipitousiy isolate which 
have converted a previously insoluble protein into a solub.e one. This suggests that the 
solub,l,ty of a protein can be optimized with a low level of mutation and that protein 
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fun*„ can be mairtalned independent|y ^ enha 

Stemmer, Nature 370, 389 (1994). and in "DMA *„..«. . _ * ^ 

fcna™ ° f ■ W* had been opamfced ^ , Ms desire(f „ 
mutations accumulated during directed evolution IK. h , n0 " eSSentK " 
3— ^ varian* by dfreCd. „ ^^JS^IT 
-to*, pressure OT on.anfcm, „ furine, dfeoussed in: -Search** se^S^ 
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(1*5); ,„ Drrected EvoluBon: Creaflng Biocata^ For me Future.- by Frances" 
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ei ai., proc. Natl. Acad. Sc. USA 94 45n4 Mocm. • . 

K 4..f « rv ( " 7); ,n Fu nctional And Nonfunctional 

Mutafions D,st.nguished By Random Combination Of Homn.no o ™ nCt, ° nal 

and Francs H. AmoK, Proc. « IT 3*^0™ T 

rT*l ' *" » Bering new protei „ 8 b y 

«*» genets „ random mutagenesis and recombinatJon coupled J, screen^ 
for improved variants is described h™*™., ... Mfeerang 
use of dtarM , . ' 8,6 are no «>ncerT.jng the 

use of drrected evolufonary processes to improve solubKv of proteins- ratoer the 

^genesis^sdi^dtoimproveme^ofp^^n.ltsL^;^ 
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however, that in order for the protein to function properly in any environment, ft must be 
correctly folded and, therefore, soluble. 

Finally, for structural determination it is often not necessary or even desirable to 
have a fully functional version of the protean. If the mutational rate is low (ensured by 
molecular backcrossing), it is likely that the structure of the wild-type and solubility 
optimized versions of a protein will be similar. As long as the protein is soluble, and a 
structure can be obtained, it should then be possible to redesign the solubility of the 
Drotein using rational mofhnWr ;* i 

Green fluorescent protein has become a widely used reporter of gene expression 
and regulation. DNA shuffling has been used to obtain a mutant having a whole cell 
fluorescence 45-times greater than the standard, commercially available plasmid GFP 
See, e.g., "Improved Green Fluorescent Protein By Molecular Evolution Using DNA 
Shuffling," by Andreas Crameri et al., Nature Biotechnology 14, 315 (1996). The 
screening process optimizes the function of GFP (green fluorescence), and thus uses a 
functional screen. Although the screening process coincidentally optimizes the 
solubility of the GFP, in that the GFP is only fluorescent when properly folded, there is 
no mention of using soluble GFP as a tag to monitor solubility of other proteins; that is, 
the function of the protein and not its solubility are being modified. In "Wavelength 
Mutations And Post-translational Auto-oxidation Of Green Fluorescent Protein - by 
Roger Heim et al., Proc. Natl. Acad. Sci. USA 91, 12501 (1994), GFP was mutagenic 
and screened for variants with altered absorption or emission spectra. The authors 
mention that in place of proteins labeled with fluorescent tags to detect location and 
sometimes their conformational changes both in vitro and in intact cells, a possible 
strategy would be to concatenate the gene for the nonfluorescent protein of interest 
with the gene for a naturally fluoresce* protein and express the fusion product 
However, the focus of this paper is the extension of the usefulness of GFP by enabling 
visualization of differential gene expression and protein localization and measurement 
of protein association by fluorescence resonance energy transfer, by making available 
two visibly distinct colors. There is no mention of the use of the gene construct for 
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solubility determinations. The paper further discusses the expression of GFP in E. coli 
under the control of a 17 promoter, and that the bacteria contained inclusion bodies 
consisting of protein indistinguishable from jellyfish or soluble recombinant protein on 
denaturing gels, but that this material was completely nonfluorescent, lacked the visible 
absorbance bands of the chromophore, and did not become fluorescent when 
solubilized and subjected to protocols that renature GFP, as opposed to the soluble 
GFP in the bacteria which undergoes correct folding and, therefore, fluoresces. 

Chun VVu et ai. in 'Novel Green Fluorescent Protein (GFP) Baculovirus 
Expression Vectors," Gene 190, 157 (1997), describe the construction of Baculovirus 
expression vectors which contain GFP as a reporter gene. The authors follow the 
production and purification of a protein of interest by in-frame cloning of the gene that 
expresses the protein in insect cells with the GFP open reading frame, thereby 
permitting visualization of the produced GFP-fusion protein using UV light However, 
the purified GFP-XylE fusion protein was found to be insoluble after harvest 

In "Application Of A Chimeric Green Protein Fluorescent Protein To Study 
Protein-Protein Interactions," by N. Garamszegi et al., Biotechniques 23, 864 (1997), 
the authors discuss the fusion between GFP and human calmodulin-like protein (CLP) 
and show that this protein retains fluorescence and the known characteristics of CLP. 
That is, the GFP portion remains responsible for efficient fluorescent signals with little or 
no influence on the properties of the fused protein of interest. The authors maintain 
that the exhibited GFP fluorescence provides information concerning the maintenance 
of the GFP structural integrity in the chimeric protein, but does not provide information 
about the integrity of the entire fusion protein and, in particular, does not allow any 
statements concerning the maintenance of CLP function or integrity. From these 
statements, it is clear that this paper does not contemplate the use of the GFP as a 
solubility reporter for the CLP. 

Accordingly, it is an object of the present invention to provide a solubility reporter 
for rapidly identifying soluble forms of proteins. 
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Another object of the invention is to provide a method for modifying the solubility 
of proteins by generating large numbers of genetic mutants of the gene which encodes 
for the protein to be solubilized which can be expressed and the resulting proteins 
screened for solubility. 

Additional objects, advantages and novel features of the invention will be set 
forth in part in the description which follows, and in part will become apparent to those 
skilled in the art upon examination of the following or may be learned by practice of the 
invention. The objects and advantages of the invention may be realized and attained 
by means of the instrumentalities and combinations particularly pointed out in the 
appended claims. 

SUMMARY OF THE INVENTION 
To achieve the foregoing and other objects, and in accordance with the purposes 
of the present invention, as embodied and broadly described herein, the method for 
determining the solubility of a protein, P, of this invention may include the steps of: 
fusing a DNA fragment, [P], which codes for the protein with the DNA [R] which codes 
for a reporter protein, R, which can be detected in solution, forming thereby a fusion 
DNA fragment, [P-R], which codes for the fusion protein, P-R, such that the solubility of 
the P-R is determined by the solubility of protein, P; ligating the [P-R] fragment into an 
expression vector to form a plasmid DNA; and introducing the plasmid DNA into an 
expression host such that the fusion protein is overexpressed therein; whereby if the 
fusion protein P-R is in solution in the host, the reporter protein R can be detected, 
thereby indicating that the protein P is soluble. 

Preferably, the DNA fragment [P] is fused with the DNA fragment [L] which codes 
for a flexible linker peptide, L, which has been fused with the DNA fragment [R], forming 
thereby either fusion DNA fragment [P-L-R] or fusion DNA fragment [R-L-P], such that 
the solubility of the fusion proteins encoded by the [P-L-RJ or the [R-L-P] are 
determined by the solubility of protein P. 

Preferably also, the DNA fragment bearing [L-RJ or [R-L] is part of an expression 
vector and/or transfection/transformation vector enabling the fusion of [P] to yield the 
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DNA fusions [P-L-R] or [R-L-P] as part of said vectors, thus enabling a host cell to 
express either the fusion protein P-L-R or the fusion protein R-L-P, such that the 
solubility of the fusion protein is determined by the solubility of protein P. 

It is also preferred that the linker peptide is short, flexible, hydrophilic and 
soluble. 

Preferably also, the reporter protein includes green fluorescent protein. 

In anntbar aenor* rtf tha r>i-^ n «_i. : . 

__ r UIVi picocm ll( » BIIUorii In accordance with its objects and 

purposes, the method for modifying the solubility of a protein, P, hereof may include the 
steps of: introducing mutations into [P], the DNA fragment which codes for the protein 
generating thereby a combinatorial library of mutated variants, [X]; in-frame fusing 
individual [X] variants with a DNA construct such as a plasmid vector which includes a 
fragment which codes for a reporter protein, [R], which can be detected in solution, 
forming thereby a set of DNA constructs containing [X-R], which code for the fusion 
proteins, X-R, such that the solubility of each of the X-R proteins is determined by the 
solubility of the variant protein X contained therein; and introducing each of the DNA 
constructs into an expression host such that the fusion protein is overexpressed 
therein; whereby if one of the fusion proteins X-R is soluble in the host therefor, said 
reporter protein R can be detected, thereby indicating that the variant of the protein P is 
soluble. 

Preferably, the DNA fragment pq is fused with the DNA fragment which codes 
for a flexible linker peptide, [L], which has been fused with the DNA fragment [R], 
forming thereby either fusion DNA fragment [X-L-R] or fusion DNA fragment [R-L-X], 
such that the solubility of the fusion proteins expressed by the [X-L-R] or the [R-L-X] are 
determined by the solubility of protein X. 

Preferably also, the DNA fragment bearing [L-R] or [R-L] is part of an expression 
vector and/or transfection/transformation vector enabling the fusion of [X] 4o yield the 
Dm fusions [X-L-R] or [R-L-X] as part of said vectors, thus enabling a host cell to 
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express either the fusion protein X-L-R or the fusion protein R-L-X, such that the 
solub.l.ty of the fusion protein is determined by the solubility of protein X. 

It is preferred that the linker peptide short, flexible, hydrophilic and soluble 
Preferably also the reporter protein includes green fluorescent protein 
It is also preferred that the step of introducing mutations into [P] generating 
thereby a combinatorial library of mutated variants JX] is achieved using gene shuffling 
and directed evolution. 

Benefits and advantaaes of the nrw^nt im,o««~~ 

" n ,v " ,uu " »«wuue me ennancement of 

the solubilrty of proteins of interest without having to individually test, (such as by large- 
scale growth of each mutant in question followed by cell *sis, fractionation and sodium 
dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE)), the solubility of each 
protein modification generated, and has general applicability. 

BRIEF DESCRIPTION OF THE DRAWINGS 
The accompanying drawings, which are incorporated in and form a part of the 
specification, illustrate the embodiments of the present invention and, together with the 
descnption, serve to explain the principles of the invention. In the drawings: 

FIGURE 1 is a flow diagram illustrating the use of the solubility reporter 
according to the teachings of the present invention; if protein, P, is insoluble, the fusion 
protein, P-L-GFP, is insoluble, aggregated or bound in inclusion bodies, and is 
nonfluorescent, while if protein P is soluble, fusion protein P-L-GFP is soluble and 
fluorescent. 

FIGURE 2 is a flow diagram illustrating the generation of mutated versions of an 
arbitrary protein, P, which have enhanced solubility, employing fluorescence-assisted 
cell sortmg to identify and select mutants with enhanced solubility. 

FIGURE 3 illustrates the performance of the GFP solubility reporter in E coli 
BL21(DE3) induced by isopropyl-p-D-thiogaiactopyranoside (IPTG) on Luria-Bertani 
(LB) media plates. 

FIGURE 4 illustrates the increase in fluorescence of clones expressing the fusion 
H-type ferritin-L-GFP during the process of directed evoluta using nutrient agar plates. 
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DETAILED DESCRIPTION 
5 Briefly, the present invention utilizes a soinhiiih, « - 

encoded by the in-frame fusion DNA fraoment rp . » • Pr ° te,n 

in the host cell far examnfa c u. „ When """""Pressed 

lor example £ coft tte fusion rm^( 8) L ~ FP ,™— . , 

terminus of L) or GFP-L (6FP fused to th» wr «3FP fused to the C 

expression host and JJ^V£ """"" " L) ~ S0,Ubb *"* 

Pre*in P-L-GFP ,p ^ to „. H4mtm ^ L X ra ^T ^ 
Host ce, Mema^, g. DNA encoding P is fus* ITT, * 
ONA fragment whfch encodes the GFP-lL^! T "** 3 
•used to me Census of GFP-L, is cau^To ^ ^ 

GFP-L and L GFP „„ „ overexpressed in the host cell. The 

and L-GFP are chosen such that the solubility of the P i rco 
controlled by the solubility of P »!.„»■• . , ^ ° r GFfM -- p fe 

rjrcr ---- ^; re „r 
^^ct^t t as — — - - - 

^iningthesoiut.yofP ^reTasTeTf ""^ ^ * 
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fragments (L-GFP, or [GFP-L], in a directed evolution of p, A 
muWed variant X is generated by gene shufflir*. for example. The resuHing pool of 
genes [XJ encoding mwated proteins X b then geneticalry fused irMrame either with a 
pool of DMA constructs such as vectors containing [U3FP] to produce a pool of DNA 
: «mm encoding fusion proteins X-L-GFP; or to a poo, of DNA construe* containing 
[GFP-L1 to produce a pool of DNA constructs encoding fusion proteins GFP-L-X each 
toon variant having solubility determined by X After introducing the DNA into an 

expression host, such as electmnnra*™ ~ . 

v.. « 1M nd. piasmra vectors into E coli 

mdividual variants with inoeased fluorescence (and hence increased solubility) may be 
"reened and separated using fluorescence-assisted „» sorbng. as an example. 
Millions of variants can be screened in one fcxrr. Further cycles of directed evolution 
may be instigated until no further improvement in solubility is obsenred. Furthermore 
mulaaons which are unnecessary for enhanced solubility which accumulated during the' 
directed evolution, can be removed by 4, ^ recombhaBon or backorossing of the 
DNA encoding enhanced variants X of P against an excess of DNA encoding wild type 
P, followed by selection of variants retaining enhanced solubility, using said solubility 
reporter. Figure 2 is a schematic illustration of the generation of mutated versions of an 
arbitrary protein. P. which have enhanced soluMity. employing .uorescence^ssisted 
cell sorting (FACS) to identify and select mutents with enhanced solubility according to 
the teaching of the present invention. 

To screen large numbers of versions of an arbitrary protein, it is desirable, but 
not essential, that reporter R be chosen to have the following characteristics: (1 ) The 
observed parameter for R, which indicates solubility of X-L* and R-L-X. must not be 
observable independent of the solubility of X or by the presence of X; (2) R should not 
dominate the solubility of X-L-R; (3) The solubility of X-L* and R-L-X should be 
determined primarily by the solubility of X; (4) R should not assist the folding of X- (5, L 
should not significantly influence the solubility of R-U X 0 r X-L-R; and (6) L should not 
dominate the folding of any of X, R, X-L-R, or R-L-X. 



WO 99/31266 



PCT/US98/25862 



11 



Having generally described the invention, the following EXAMPLES illustrate the 
application of the method of the present invention in greater detail. 

EXAMPLE 1 

As an example of the assembly of a construct which satisfies the above- 
described six criteria, a BgMI/Xho-1 fragment of plasmid pET-21a(+), containing: the T7 
promoter lac operator sequence; ribosomal binding site; and multiple cloning site was 
ligated into the Bgl-ll/Xho-1 site of pET-28a(+). The resulting hybrid plasmid contained 

, — , — . . ,s K iiuauuu ui u ic pci-^oa^) DacKDone. ihe pET21a(+) 

and pET28a(+) vectors were used as obtained from a commercial source. The vector 
was digested with Nde-1 and BamH-1, the small fragment was discarded, and replaced 
with an in-frame stuffer such that the sequence, inclusive of the Nde-1 and BamH-l 
sites, was [CATATGTGTAGACAGCTGGGATCC]. Next, the vector was digested with 
BamH-l and EcoR-1 and the small stuffer was discarded. The BamH-l/EcoR-1 site was 
filled with the DNA fragment [GGATCCGCTGGCTCCGCTGCTGGTTCTGGCGAATTC], 
coding for the flexible linker L (GSAGSAAGSGEF). An improved variant of GFP was 
created by site-directed mutation using recombinant PCR (see, e.g., "Recombinant 
PCR" by Russel Higuchi in "PCR Protocols, a Guide to Methods and Applications", 
Michael A. Innis, David H. Gelfand, John J. Sninsky, and Thomas J. White, eds. 
Academic press, Inc., 177, (1990)), of the soluble variant of Crameri et al., supra, to 
yield the red-shrft S65T mutation (See, e.g., "Improved Green Fluorescence," by Roger 
Heim et al., Nature 373, 663, (1995)) which improves the performance of the protein in 
FACS, by increasing the absorption of the fluorophore of 488 nm light (near the argon 
laser emission commonly used for FACS). The internal Nde-1 and BamH-1 sites were 
abolished by silent-mutation. The resulting GFP variant was amplified by PCR using 
the 5' primer [GATATAGAATTCAGCAAAJ3GAGAAGAACTTTTC], incorporating a 5' 
EcoR-1 site; and the 3' primer [GAATTCGGTACCTTATTTGTAGAGCTCTACCATJ, 
incorporating a 5' Xho-1 site. The resulting vector was digested with EcoR-1/Xho-1, the 
stuffer discarded, and replaced with the EcoR-1 /Xho-1 -digested EcoR-1:GFP:Xho-1 
amplicon, and the circular plasmid produced thereby was transformed by 
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electroporation into the £ co// strain BL21(DE3) genotype: (R ompThsdS B (r B m B ) oa/ 
dcm (DE3)), a commercially available strain. The construct in the pET vector system is 
inducible by IPTG. A transformant was used to inoculate a culture of LB and grown to 
an optical density (O.D.) at 600 nm of approx. 0.5, IPTG was added to a final 
concentration of 1 mM, and induction was allowed to proceed for 2 h. The bright green 
fluorescence, visible under room lighting, indicated that the fusion construct was soluble 
and well-expressed. Next, the small in-frame stuffer fragment between Nde-1 and 

BamH-1 was remnvAri hv restriction r(iaae>t i . ... . _ 

j — "lys*!, cjnu icpia^cu uy an uui-or-Trame stutter with 

3 translational stops. Cells expressing this fusion were non-fluorescent due to 

termination of translation prior to the GFP. Finally, the vector was digested with Nde- 

1+BamH-1 to remove the stuffer and create a recipient site for Nde-1/BamH-1 flanked 

inserts. This recipient vector is subsequently referred to as the solubility-reporter 

vector. The specific examples described below use primers for the genes of interest 

which contain Nde-1 (N-terminus) and BamH-1 (C-terminus). The use of an out-of- 

frame stuffer insures that and vectors escaping digest code for non-fluorescent 

constructs and thus had the effect of eliminating false-positives. 

The response of the reporter system prepared as described hereinabove to two 

proteins (one highly soluble, the other highly insoluble) which are each efficiently 

overexpressed in E. coli is demonstrated in Fig. 3. A fusion to the highly soluble protein 

malE, which is widely used as a fusion protein to facilitate the purification of 

overexpressed proteins in E.coli, [malE-L-GFP], was selected to demonstrate the 

response of the reporter system to a soluble protein. A fusion construct with xylR, a 

highly insoluble bacterial regulator protein, [xylR-L-GFP], was chosen to demonstrate 

the response of the reporter system to an insoluble protein. The constructs were 

overexpressed in strain BL21(DE3), clones were allowed to grow on nitrocellulose 

membranes on LB media agar plates containing kanamycin until colonies were 1-2 mm 

in diameter. The membranes bearing the colonies were transferred to LB media agar 

plates containing kanamycin and the IPTG inducer to cause overexpression of the 

fusion proteins. Under long-wavelength UV radiation Fig. 3a is a photograph of the 
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resulting brightly fluorescent colonies where the protein malE-L-GFP is overexpressed, 
while Fig. 3b is a photograph of the resulting weakly fluorescent colonies where the 
protein xylR-L-GFP is overexpressed. 

The response of the solubility repbrter system during improvement of the 
solubility of bullfrog H-ferritin by directed evolution of the expressed fusion construct, 
[ferritin-L-GFP], is shown in Fig. 4. The 6 clones of the ninth row (from left to right) are: 
wild type (barely visible at the extreme left); followed by optima, (brightest, most 

soluble), from cvcles 1. 2. 3 and 4 nf HiroMaH owoinfio* r.»«i * ■ i • . 

' — u.uiuuui ■, anu ivuiiu i ui uauwrossing of 

the round 4 optima against the wildtype ferritin. The upper grid of 8 rows, 6 clones per 
row (48 colonies), are optima from a second round of backcrossing to remove non- 
essential mutations. With each cycle, the fluorescence (and hence solubility) improves. 

Figure 5 shows the use of an SDS-PAGE gel to illustrate the effectiveness of 
solubility reporters in a directed evolution process to improve the solubility of bullfrog H- 
type ferritin expressed in E. coli. Cultures expressing non-fusion constructs of ferritin 
alone were sonicated to lyse cells, and the soluble and insoluble fractions were 
separated by centrifugation. Fractions were resolved by SDS-PAGE; here, S = soluble 
(supernatant) fraction, P = insoluble (pellet) fraction. Molecular weight marker ladder, M 
= 10 kDAL. Lanes 1,2 are bullfrog L-type ferritin, a soluble protein used as control; 
lanes 3,4 are insoluble wild-type bullfrog H-type ferritin; lanes 5,6 are the round 4 
optimum variant of bullfrog H-type ferritin after 2 rounds of back-crossing against the 
wildtype to remove spontaneous mutations not related to solubility. Improvement of the 
solubility of round 4 variant is observed by comparing lane 5 with the wildtype (lane 3) 
H-type ferritins. Round 4 optimum (with 2 back-crossing rounds) was picked from row 
3, column 2 of the plate shown in Fig. 4 hereof, and shows that the strong fluorescence 
from the solubility reporter is indeed related to solubility of the fusion protein construct. 

EXAMPLE 2 

The above-described use of a solubility reporter can be analogously extended to 
determine the solubility of protein fragments. For example, to determine the solubility of 
fragments F of a protein P, the DNA [P] is subjected to a partial enzymatic digest, (e.g., 
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by DNASE-I in the presence of the divalent cations Mn* or Co 2 *), to create a pool of 
smaller fragments, [F]. The fragments can be polished with a proof-reading polymerase 
beanng 3'-5' exonuclease activity to yield blunt-ends, or subsequently given A- 
overhangs by treatment with a polymerase* devoid of 3'-5' exonuclease activity with 
excess dATP (e.g., Taq polymerase). If desired, a particular size range of the 
fragments [F] may be selected, by agarose gel electrophoresis as an example After 
l-gation (e.g., blunt-end or T/A overhang) with the pool of appropriate recipient solubility 
reporter vector (e.g., bearing a blunt-end 

» w,kW '■•-•iaiiK5 witri [L.-f\j) t some 

of the fragments [F] will form in-frame translational fusions, [F-L-R]. After 
transformation into an appropriate host, (e.g., E. «,//), expressed fusion proteins F-L-R 
wh.ch contain a soluble fragment F will be soluble, and detectable in the host by virtue 
of R (e.g., if R is GFP the host cells will be fluorescent). Thus, the above-described 
solubility reporter method may be used to determine the solubility of a protein, its 
variants (mutants), and fragments thereof. 

EXAMPLE 3 

EXAMPLE 1 has shown that GFP can be used as a solubility reporter. However 
solubility reporters incorporating a translational fusion [P-L-R] include systems in which 
R is a protein/peptide other than GFP. When the fusion construct [P-L-R] is used, R 
can be a protein/peptide which gives a detectable signal observable by chemical 
biological or physical means, when linked to P-L as P-L-R. As an example, R could be 
the beta-galactosidase enzyme, lacZ. Clones expressing P-L-lacZ in which P is a 
soluble protein are detected by the enzymatic activity of lacZ (See, e.g., "Beta- 
Galactosidase Gene Fusions For Analyzing Gene Expression In Escherichia Co// And 
Yeast," by M. Casadaban et al., Method* Enzymol. 100, 293 (1983)) on substrates 
which yield a colored reaction product (For example, X-gal (5-bromo^chloro-3-indolyl- 
P-D-galactoside)). Colonies expressing fusion proteins with p^alactosidase activity 
torn blue on plates containing X-gal. Furthermore, in situations where the lacZ protein 
proves too large, the functionally complementable lacZcc fragment is used as a 
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substitute. The complementary fragment A-lacZ is provided by the host chromosome 
(For example, £ coli strain DH10B (P mc/A A(mrrhsdRMS-mc,3C) 4>80d/acZAM15 
A/acX74 deoR recAl endA1 araD139 A(a/a,fe W )7697 ga/U galKX' rpsl nupG), where 
the complementary fragment is provided by *80d/acZAM15. Fusion proteins P-L-lacZct 
containing a soluble protein P are soluble and contain a correctly-folded lacZa, thereby 
leading to complementation of the A-lacZ fragment and restoration of lacZ p- 
galactosidase activity. 

EXAMPLE 4 

Reporter proteins R, which have optima) activity when present in a non-fusion 
context may be employed for assays. The construct P-L-C-R is generated, where C is 
a unique protease site. For example, C could be the viral protease cleavage site for the 
plum pox virus Nla protease (See, e.g., M. Martin et al., "Determination of polyprotein 
processing sites by amino terminal sequencing of nonstructural proteins encoded by 
plum pox polyvirus", Virus Res. 15, 97, (1990)), and R is the lacZa fragment, as an 
example. The construct P-UC-lacZa and the viral protease (Nla) could each be 
expressed under the control of separately inducible promoters on separate plasmids 
with compatible origins of replication. For an example of the use of multiple compatible 
plasmids with cloning sites under independently controlled promoters, see R. Lutz and 
H. Berjard, "Independent and tight regulation of transcriptional units in £ coli via the 
LacR/O, the TetR/O and AraD/IH, regulatory elements", Nucleic Acids Res., 25(6), 
1203, (1997). The plasmids and required £ coli host strains are commercially 
available; for example, the P-L-C-lacZa construct could be expressed under the control 
of the tet promoter, and the Nla gene under the control of the arabinose 
promoter/represser. The plasmid(s) would be transformed into the appropriate £ coli 
host (see Lutz, supra), and anhydrotetracjteline added to the growth medium to induce 
expression of P-L-C-lacZa. After accumulation of the fusion protein P-L-C-lacZa, 
arabinose+IPTG is added to the growth medium to induce expression of the Nia 
protease. P-l-C-lacZa is soluble and contains a correctly-folded lacZa domain, and P- 
L-C-lacZa is cleaved at site C, only if P were soluble. Subsequent release of lacZa 
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complements the A-lacZ fragment and restores lacZ p-galactosidase activity, which is 
detected by standard colorimetric or fluorometric assays for p-galactosidase activity. As 
another example, R might be an antibiotic selection marker such as the p-lactamase 
gene (o/a), which confers resistance to penicillin-derived antibiotics commonly used in 
cloning vectors. The p-lactamase gene contains a signal peptide and is translocated to 
the periplasm of E. coli. However, proper processing of the antibiotic resistance protein 
and translocation to the periplasm would be impeded by N-terminus fusions, although 
cleavage by the protease obviates this problem. The P-L-C-p-lactamase fusion protein 
would be soluble only if P were soluble. Concomitant induction by both 
anhydrotetracycline and IPTG+arabinose would provide both the fusion protein P-L-C- 
p-lactamase and the viral cleavage protease Nla. In cells bearing soluble variants of P, 
the fusion protein P-L-C-p-lactamase would be soluble and cleaved at C by virtue of the 
protease Nla, releasing functional p-lactamase resistance protein, thereby conferring 
antibiotic resistance to the antibiotic ampicillin. Conversely, in cells bearing non-soluble 
variants P, the fusion protein would be insoluble, the protease cleavage site C would be 
buried in inclusion bodies, and thereby inaccessible to cleavage by the viral protease. 
Furthermore, the p-lactamase protein would be buried in inclusion bodies, misfolded 
and non-functional. Such cells would not have resistance to the antibiotic ampicillin. It 
would be apparent to those having skill in the biochemical arts that selection for cells 
bearing soluble variants of P (and therefore having antibiotic resistance) could be 
accomplished by challenging mixtures of the above-mentioned cells by supplying the 
selective agent (e.g., the antibiotic ampicillin) in the growth medium. Moreover, it is 
likewise apparent to one having skill in the art that both the fusion protein P-L-C-p- 
lactamase and the protease Nla must be made continuously available to confer 
antibiotic selection throughout the life of the cell, and thus both genes must be 
simultaneously induced (in this example, by providing both anhydrotetracycline and 
IPTG/arabinose in the growth media). Cells with antibiotic resistance will survive, 
thereby selecting for soluble variants of P. Furthermore, additional improvement in the 
solubility of such variants could be accomplished by increasing the concentration of 
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selective agent (e.g. ampicillin) during subsequent rounds of recombination and 

selection. 

T1» foregoing description of the invert™ has been presented for purposes of 
illustration and description and is not intended to be exhaustive or to limit the invention 
to the urease form disclosed, and obviously many modifications and variations are 
posslbie in light of the above teaching. For example, it would be apparent one having 
stall ,n biochemist* after reviewing the present disclosure that the method of the 

present invention can he imniomonM — ■ ■■ -- 

r ._ ll8u ,„ yeasi ana mammalian cells, wherein 

ftision proteins P-U3FP are expressed to create a soluMHy reporter, similarly. 
d,rected evolution for improving „* solubility of proteins can be performed using Insect 
cells, and the required DNA manipulation according to the teachings of the present 
invention can be achieved in vitro or in vivo. 

The embodiments were chosen and described in order to best explain the 
pnnaples of the invention and its practical application to thereby enable others skilled in 
the art to best utilize the invention in various embodiments and with various 
medications as are suited to the particular use contemplated. It is intended that the 
scope of the invention be defined by the claims appended hereto. 
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WHAT IS CLAIMED IS: 

1 A method for determining the solubility of a protein, P, which comprises 
the steps of: 

(a) fusing a DMA fragment, [P], which codes for said protein with the DMA 
[R] which codes for a reporter protein, R, which can be detected, forming 
thereby a fusion DNA fragment, [P-R], which codes for the fusion protein, 

P-R such that the* n « . 

_ — w ,ub,„ iy ui r-n is determined by the solubility of 

protein, P; and 

(b) introducing said DNA into an expression system such that fusion 
protein P-R is overexpressed therein; whereby if fusion protein P-R is 
soluble in said expression system, reporter protein R can be detected, 
thereby indicating that protein P is soluble. 

2. The method for determining the solubility of a protein as described in 
claim 1, wherein DNA fragment [P] is fused with the DNA fragment which codes 
for a flexible linker peptide, [L], which has been fused with DNA fragment [RJ, 
forming thereby a fusion DNA fragment selected from the group consisting of [P- 
L-R] and [R-L-P], such that the solubility of fusion proteins expressed by said [P- 
L-R] and said [R-L-PJ are determined by the solubility of said protein P. 

3. The method for determining the solubility of a protein as described in 
claim 2, wherein linker peptide, L is chosen to be short, flexible, hydrophilic and 
soluble. 

4. The method for determining the solubility of a protein as described in 
claim 1, wherein reporter protein R is selected from the group consisting of green 
fluorescent protein and variants, thereof, lacZ, the lacZ-a fragment, and 
selectable marker proteins. 

5. The method for determining the solubility of a protein as described in 
claim 4, wherein said lacZ and lacZ-a fragments include enzymes having 
chromogenic and fluorogenic substrates. 
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6 The memod for dining the solubiWy of a protein as desajbed 
4, wherein said selectable marker proteins are selected ^ ^ 

coning o, amp«„ prote|ns , ^ J 

kanamycm resistance proteins and a'rsenic resistance proteins 

7 The method for determining the solubility of a protein as descnhed in 
da«n 1 wherein ^ proteln fe . i ^ gmmt ^ a farser prote . n ^ ^ 

p^el fra9TOnt " 8 * a9me "' °' ^ DNA WhiCH ^ for — * 

8. The metood for determine solubhHy of a protein as desoibed in 
c*»n 7, wherein said DNA fragment which encode prote* fragments * a terger 
P— . are generated using methods from the group consist of parte. DNASE 
digest radiafalnduced fragment, chemfca. fragment, enzymabo 
digest endonuclease digest exonuoleese digest aoousSo^echanica. shearing 
and fragmentation. 

9. The method for determining me solubility of a prot ei„ as descnbed in 

Trl <ragmen,S ^ *° " »efbre said stap o, .using 

said DNA fragment with the DNA [R] „ hich codes for a reporter protein, R. using 
™*«s selected from the group consi*g of poryecrylamH. gel 
electrophoresis, agarose ge. electrophoresis, capiliary etectrophoresis, end high 
pressure liquid chromatography. 

«^ A method for modifying the soiubilNy <rf a protein, P. which comprises the 

(a) introducing mutations into [P], ^ DNA fragment which codes for said 
protein, generating thereby a combinatorial Itora* of mutated variants pq- 

(b) in-frame fusing individual pq variants, with a DNA construct which 
contains [RJ which codes for a reporter protein R which can be detected in 
solubon, forming thereby a set of DNA constructs containing [X-R], which 
code for ft. fusion proteins, X-R, such that the solubility of each of said X- 
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R proteins is determined by the solubility of variant protein, X contained 
therein; and 

(c) introducing each of said DMA constructs into an expression host such 
that fusion proteins X-R areWxpressed therein; whereby if one of 
said fusion proteins X-R is soluble in said host therefor, said reporter 
protein R can be detected, thereby indicating that the mutated variant of 
said protein P is soluble. 

1 1 . The m«»tt»<»H 

' ■"«"»y»g «>e soiuoniiy or a protein P as described in 

claim 10, wherein DNA fragment rX] is fused with the DNA fragment which codes 
for a flexible linker peptide, [L], which has been fused with said DNA fragment 
[R], forming thereby a fusion DNA fragment selected from the group consisting of 
PC-L-R] and [R-L-X], such that the solubility of said fusion proteins expressed by 
said [X-L-R] and said [R-L-X] are determined by the solubility of protein X. 

12. The method for modifying the solubility of a protein as described in claim 
11. wherein linker peptide, L is chosen to be short, flexible, hydrophilic and 
soluble. 

13. The method for modifying the solubility of a protein as described in claim 
10, further comprising the step of collecting said expression hosts expressing X, 
a more soluble form of protein P than the form of protein P expressed by the 
wild-type DNA. 

14. The method for modifying the solubility of a protein as described in claim 
13, wherein said expression hosts containing a soluble form X of protein P are 
separated by fluorescence assisted cell sorting from said expression hosts which 
contain an insoluble form X of protein P, before said step of collecting said 
expression hosts expressing X. 

15. The method for modifying the solubility of a protein as described in claim 
13, wherein said expression hosts containing a soluble form X of protein P are 
separated from said expression hosts which contain an insoluble form X of 
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protein P using nutrient agar plates, before said step of collecting said 
expression hosts expressing X. 

16. The method for modifying the solubility of a protein as described in claim 
10, wherein reporter protein R is selected from the group consisting of green 
fluorescent protein and variants thereof, lacZ. the lacZ-a fragment, and 
selectable marker proteins. 

17. The method for modifying the solubility of a protein as described in claim 

16. Wherein caiH lpr*7 i- 

.a~. „ liU otIIU Iact ^ nagments jnclude en2ymes navjng 
chromogenic and fluorogenic substrates. 

18. The method for modifying the solubility of a protein as described in claim 
16, wherein said selectable marker proteins, are selected from the group 
consisting 0 f ampicillin resistance proteins, tetracycline resistance proteins, 
kanamycin resistance proteins and arsenic resistance proteins. 

19. The method for modifying the solubility of a protein as described in claim 
10, wherein said step of introducing mutations into [P], thereby generating a 
combinatorial library of mutated variants pq, includes methods selected from the 
group consisting of recombination, error-prone PGR, propagation in error-prone 
host strains, doping mutagenesis, saturation mutagenesis, chemical 
mutagenesis, irradiation mutagenesis, sfenlirected mutation, and combinations 
thereof. 

20. The method for modifying the solubility of a protein as described in claim 
13. further comprising the step of recombining the DNA encoding [X] from each 
of said collected expression hosts expressing a soluble form of said protein P. 
thereby yielding a pool of variant DNA fragments pq encoding mutants X of said 
protein P with further enhanced solubility. 

21. The method for modifying the solubility of a protein as described in claim 
20, wherein said step of recombining the DNA encoding variants pq wtih 
enhanced solubility is accomplished using recombination. 
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22. The method for modifying the solubility of a protein as described in claim 21, 
wherein the recombination is achieved by in vitro by gene shuffling. 

23. The method for modifying the solubility of a protein as described in claim 21 
wherein the recombination is achieved in vivo by cell-mediated recombination. 

24. The method for modifying the solubility of a protein as described in claim 
10, wherein mutations which do not improve solubility are removed from the DMA 
encoding protein X by recombination of the DNA encoding protein X with wild 

type DNA fraamenfe fniinwoH *», 

~ * — Ww ~ oc,cuuu " <w me most soluble variants. 

25. The method for modifying the solubility of a protein as described in claim 
10, wherein said protein is a fragment of a larger protein and the DNA which 
codes for said fragment, is a fragment of the DNA which codes for said larger 
protein. 

26. The method for modifying the solubility of a protein as described in claim 

25, wherein said DNA fragments which encode protein fragments of a larger 
protein are generated using methods from the group consisting of partial DNASE 
digest, radiation-induced fragmentation, chemical fragmentation, enzymatic 
digest, endonuclease digest, exonuclease digest, acoustic/mechanical shearing, 
and fragmentation. 

27. The method for modifying the solubility of a protein as described in claim 

26, wherein said DNA fragments are size selected before said step of fusing said 
DNA fragment with the DNA [R] which codes for a reporter protein, R, using 
methods selected from the group consisting of polyacrylamide gel 
electrophoresis, agarose gel electrophoresis, capillary electrophoresis, and high 
pressure liquid chromatography. 
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